Search CORE

29 research outputs found

Reproducibility of scientific workflows execution using cloud-aware provenance (ReCAP)

Author: C Scheidegger
E Deelman
EHBM Gronenschild
G Juve
Ilkay Altintas
J Kim
Johannes Starlinger
K Munir
K Munir
Kamran Munir
Kamran Munir
Khawar Hasham
R Sakellariou
T Glatard
W Stallings
Y Simmhan
YL Simmhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2018
Field of study

© 2018, Springer-Verlag GmbH Austria, part of Springer Nature. Provenance of scientific workflows has been considered a mean to provide workflow reproducibility. However, the provenance approaches adopted so far are not applicable in the context of Cloud because the provenance trace lacks the Cloud information. This paper presents a novel approach that collects the Cloud-aware provenance and represents it as a graph. The workflow execution reproducibility on the Cloud is determined by comparing the workflow provenance at three levels i.e., workflow structure, execution infrastructure and workflow outputs. The experimental evaluation shows that the implemented approach can detect changes in the provenance traces and the outputs produced by the workflow

Crossref

UWE Bristol Research Repository

Integral privacy

Author: C Dwork
E Bertino
G Barbier
K Muralidhar
S Das
WE Winkler
YL Simmhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

El treball es va presentar a la International Conference on Cryptology and Network Security (15th : 2016 : Milan, Italy)When considering data provenance some problems arise from the need to safely handle provenance related functionality. If some modifications have to be performed in a data set due to provenance related requirements, e.g. remove data from a given user or source, this will affect not only the data itself but also all related models and aggregated information obtained from the data. This is specially aggravated when the data are protected using a privacy method (e.g. masking method), since modification in the data and the model can leak information originally protected by the privacy method. To be able to evaluate privacy related problems in data provenance we introduce the notion of integral privacy as compared to the well known definition of differential privacy

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Diposit Digital de Documents de la UAB

A unified framework for managing provenance information in translational research

Author: A Ayers
A Borgida
A Gangemi
AH Asiaee
Amit P Sheth
AP Chapman
B Smith
B Weatherly
C Aurrecoechea
CF Taylor
D Brickley
D Oberle
DL McGuinness
DL Wheeler
DLWD Martin
E Prud'ommeaux
E Sirin
G Klyne
HSU Parkinson
I Niles
J Pérez
J Widom
J Zhao
JR Hobbs
KKSM Muniswamy-Reddy
KLSE Eilbeck
L Chiticariu
M Ashburner
M Kanehisa
M Vardi
O Bodenreider
O Bodenreider
O Bodenreider
Olivier Bodenreider
P Buneman
P Hayes
P Hitzler
Priti Parikh
R Angles
RSK Mehra
Satya S Sahoo
SS Sahoo
SS Sahoo
SS Sahoo
SS Sahoo
SS Sahoo
T Lee
TJ Green
Todd Minning
V Cross
Vinh Nguyen
Y Cui
YL Simmhan
YR Wang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A critical aspect of the NIH <it>Translational Research </it>roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the <it>provenance </it>metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists. Results We identify a common set of challenges in managing provenance information across the <it>pre-publication </it>and <it>post-publication </it>phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata: (a) Provenance collection - during data generation (b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics (c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications (d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for <it>Trypanosoma cruzi </it>(<it>T.cruzi </it>SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness. Conclusions The SPF provides a unified framework to effectively manage provenance of translational research data during pre and post-publication phases. This framework is underpinned by an upper-level provenance ontology called Provenir that is extended to create domain-specific provenance ontologies to facilitate provenance interoperability, seamless propagation of provenance, automated querying, and analysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CORE

Provenance for Online Decision Making

Author: C Dai
J Golbeck
L Moreau
L Moreau
WL Teacy
YL Simmhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

It is commonly believed that provenance can be utilised to form assessments about the quality, reliability or trustworthiness of data. Once presented with contradictory or questionable information, users can seek further validation by referring to its provenance. While there has been some effort to design principled methods to analyse provenance, the focus has mostly been on offline use of provenance. How to use provenance at runtime, i.e., as the application runs, to help users make decisions, has been barely investigated. In this paper, we propose a generic and application-independent approach to interpret provenance of data to make online decisions. We evaluate the system in CollabMap, an online crowd-sourcing mapping application, to make decisions about the quality of its data and to determine when the crowd's contributions to a task are deemed to be complete

CiteSeerX

Southampton (e-Prints Soton)

Crossref

A framework for establishing trust in Cloud provenance

Author: IM Abbadi
Imad M. Abbadi
YL Simmhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Provenance based data integrity checking and verification in cloud environments

Author: Awais Ahmad
Bilal Jan
CC Erway
DA Patterson
Fakhri Alam Khan
G Ateniese
H Takabi
Helmut Hlavacs
Inam Ul Haq
K Hashizume
K Yang
L Moreau
M Armbrust
M Imran
M Imran
M Imran
Muhammad Imran
S Ding
S Ding
Y Zhu
YL Simmhan
YL Simmhan
YL Simmhan
Yongtang Shi
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

Fotografia B034

Author: DA Holland
DE Denning
E Bosman
E Gessiou
J Cheney
J Frew
J Kim
J Widom
J-L Freire
L Cavallaro
L Moreau
M Costa
S Bowers
S Magliacane
S Miles
T Nies De
T Oinn
YL Simmhan
YL Simmhan
Publication venue
Publication date: 09/08/2013
Field of study

Knowing the provenance of a data item helps in ascertaining its trustworthiness. Various approaches have been proposed to track or infer data provenance. However, these approaches either treat an executing program as a black-box, limiting the fidelity of the captured provenance, or require developers to modify the program to make it provenance-aware. In this paper, we introduce DataTracker, a new approach to capturing data provenance based on taint tracking, a technique widely used in the security and reverse engineering fields. Our system is able to identify data provenance relations through dynamic instrumentation of unmodified binaries, without requiring access to, or knowledge of, their source code. Hence, we can track provenance for a variety of well-known applications. Because DataTracker looks inside the executing program, it captures high-fidelity and accurate data provenance

Crossref

VU Research Portal

Archivio Aperto di Ateneo

Fotografia P246

Author: A Gehani
B Glavic
DA Holland
H Firth
J Frew
L Carata
L Moreau
M Stamatogiannakis
U Braun
YL Simmhan
Publication venue
Publication date: 02/02/2015
Field of study

Automatic provenance capture from arbitrary applications is a challenging problem. Different approaches to tackle this problem have evolved, most notably a. system-event trace analysis, b. compile-time static instrumentation, and c. taint flow analysis using dynamic binary instrumentation. Each of these approaches offers different trade-offs in terms of the granularity of captured provenance, integration requirements, and runtime overhead. While these aspects have been discussed separately, a systematic and detailed study, quantifying and elucidating them, is still lacking. To fill this gap, we begin to explore these trade-offs for representative examples of these approaches for automatic provenance capture by means of evaluation and measurement. We base our evaluation on UnixBench—a widely used benchmark suite within systems research. We believe this approach will make our results easier to compare with future studies

Crossref

VU Research Portal

Archivio Aperto di Ateneo

Data provenance in SOA: security, reliability, and integrity

Author: Dawei Zhang
Jen-Yao Chung
R Bose
Ray Paul
S Ross
W. T. Tsai
Xiao Wei
Yinong Chen
YL Simmhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Light Touch Identification of Cost/Risk in Complex Socio-Technical Systems

Author: A Trendowicz
A Trendowicz
B Boehm
CG Cobb
E Mendes
ES Yu
I Eleftheriou
M Jørgensen
MM Yusof
Q Li
RS Aguilar-Saven
YL Simmhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Part 1: Regular PapersInternational audienceInformation sharing within complex organisations is often source of considerable cost and risk. In previous work, we showed that points of highest IT cost/risk within an organisation are often located at the points where data moves from one context of use to another. We proposed a lightweight method of modelling the journeys data make within an organisation, and showed how to identify risky or costly boundaries. In this paper, we build on this previous work by evaluating the stability and completeness of the three core boundaries of our proposed method with staff from different clinical genomics hospital departments in the UK. Assessing our boundaries in the four new studies we found that although our core boundaries are stable in the new area of Clinical Genomics, domain-specific requirements of organisations can drive the need for additional boundaries. Finally, we discuss the feasibility of a general, low-cost process for identifying further boundaries of interest when applying the method to a new domain

Crossref

The University of Manchester - Institutional Repository